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1. INTRODUCTION 

The importance of accurate forecasting of traffic in the short term lies in facilitating the daily lives 
of citizens, as it provides them with the best paths to reduce time and effort during trips, in addition to 
helping government agencies develop appropriate plans to prevent traffic congestion [1]. So, the efficient 
prediction of traffic flow data has become a crucial part of intelligent transportation systems (ITSs). 
Although, short-term traffic flow forecasting is extremely difficult because of the non-linearity of traffic data. 
With the proliferation of ITSs in recent years, the ability to precisely and effectively estimate traffic flow has 
drawn considerable interest [2]. The prediction of Traffic flow is the process of predicting the traffic flow 
distribution by using the historical data of traffic flow. It consists of both short-term and long-term traffic 
flow forecasts. The range of short-term traffic forecasts is five minutes to one hour [3]. Real-time and reliable 
data on short-term traffic changes can reduce traffic strain, prevent accidents, and alleviate traffic congestion 
to some extent. So, short-term traffic forecasting has become an important research area in the traffic flow 
prediction field [4]. 

The growth of data collecting and sensor technologies of traffic flow data provides more robust data 
and modeling. The features of traffic flow data are non-linearity, periodicity, randomness, and volatility [5]. 
So, the prediction using traditional methods is very difficult such as autoregressive integrated moving 
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average (ARIMA), support vector regression (SVR), and multi-variable linear regression. These methods do 
not account for the full range of traffic flow features and so do not achieve accurate traffic forecasting [6]. 
Continuous advancements in machine learning and deep learning theory create new challenges for ITS 
development. Due to their robust non-linear fitting capabilities, deep learning networks are commonly 
employed for short-term traffic flow predictions [7]. 

This study introduces two novel models. The first model uses two long-short term memory (LSTM) 
units that can extract the traffic flow temporal features followed by four dense layers to perform the traffic 
flow prediction. The second model uses two gated recurrent unit (GRU) units that can extract the traffic flow 
temporal features followed by three dense layers to perform the traffic flow prediction. The main advantage 
of these models its ability to effectively capture the complex non-linearity of traffic flow, which will improve 
the traffic flow prediction accuracy. The structure of the paper is coming as section 2 presents the related 
works for traffic flow prediction, section 3 presents the proposed method, the experimental work and results 
are explained in section 4, and the paper is concluded in section 5. 


2. RELATED WORK 

The increasing volume of traffic has impacted the viability of urban growth. A lot of strategies have 
been investigated for forecasting traffic flow with great accuracy, taking into consideration a variety of 
factors and characteristics. There are typically three main types of traffic flow prediction techniques: 
parametric techniques, non-parametric techniques, and deep learning techniques. 


2.1. Parametric techniques 

The most commonly employed parametric model is (ARIMA) model, a time-series technique. It is 
standard for estimating the flow of traffic over the next few short periods of time, and it is also used to 
predict the flow of traffic on expressways and urban areas [8]. Numerous variations of the ARIMA model 
were subsequently presented to improve the performance of the prediction. For instance, a KARIMA model 
combining ARIMA with the Kohonen network for prediction addresses the issue of non-linearity of the data, 
as the ARIMA model cannot deal with this kind of data [9]. Combining ARIMA with explanatory variables, 
an ARIMAX model was suggested to increase forecasting performance [10]. Ghosh et al. [11] developed the 
Bayesian Seasonal ARIMA model that utilizes the Bayesian method to increase forecasting performance. 
Duan et al. [12] developed the spatio-temporal ARIMA (STARIMA) model to extract the spatio-temporal 
features in the data of traffic flow to achieve more precise prediction performance. Parametric models cannot 
describe traffic flow accurately using analysis formulas due to their non-linear stochastic nature. Therefore, 
there is great interest in non-parametric models in traffic flow forecasting. 


2.2. Non-parametric techniques 

The non-parametric models include some approaches, such as support vector regression, k-nearest 
neighbor (k-NN) methods, and artificial neural networks (ANNs) [13], [14]. For instance, the k-NN is used to 
estimate short-term traffic flow, although its performance is lower than time-series linear methods [15]. 
Chang et al. [16] proposed a model by utilizing an abundance of past data to enhance forecasting 
performance. Jeong et al. [17] suggested weighted online learning (SVR) to increase forecasting 
performance. In addition, numerous ANN-based models have been presented in [18]—[22] for traffic flow 
forecasting. Moreover, early neural network-based works typically employed shallow networks, which were 
incapable of capturing the uncertainty and complicated non-linearity of the traffic flow [2]. So, the non- 
parametric models are insufficient for achieving reliable prediction performance. 


2.3. Hybrid techniques 

A hybrid technique is the integration of two or more models to increase performance. Hybrid 
techniques are used to overcome the dynamic change in traffic flow, which results in uncertainty in traffic 
flow. On this basis, hybrid models have been proposed to adapt to these changes to predict the behavior of 
vehicles in the short term. For instance, an aggregation model is employed in [23] to achieve more precise 
prediction performance. For traffic flow forecasting, a method integrating the ARIMA model with 
cumulative sum algorithms was introduced in [24]. Dimitriou et al. [25] proposed an adaptive hybrid fuzzy 
rule-based method for traffic flow forecasting. 


2.4. Deep learning techniques 

Deep learning can increase the prediction performance of traffic flow, saving time and costs [26]. 
Due to the stochastic nature and the non-linearity of urban mobility data, deep learning is utilized in several 
studies to identify patterns and develop suitable forecasting models to predict urban mobility data [27], [28]. 
In recent years, numerous deep learning techniques have been developed for traffic flow forecasting. The 
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first model to apply the deep learning approach is introduced in [29], where feature learning is done using the 
deep belief network, and the training process is done using multitask learning method. 

A stacked autoencoder (SAE) model was also suggested in [30] for the prediction of traffic flow to 
extract the spatial and temporal characteristics of traffic data. A deep learning method employing LSTM was 
described in [31] to capture the temporal characteristics of traffic flow. Jia et al. [32] proposed a deep 
learning model that uses LSTM to estimate the flow of traffic in rainfall circumstances more accurately. 
Polson and Sokolov [33] proposed a deep learning linear model able to achieve more precise prediction 
performance by capturing the non-linear and the Spatio-temporal effects in the traffic data. 

Du et al. [34] proposed a hybrid multimodal deep learning system adequately predicts complicated, 
non-linear urban traffic flow by incorporating modality traffic data representation properties. 
Xiao and Yin [35] presented a hybrid LSTM model able to deal with diverse traffic situations, and it has a 
smaller prediction error than other models but requires a somewhat longer running time. Wu et al. [36] 
proposed a model that uses CNN to capture the spatial features and LSTM to extract the traffic flow temporal 
features. Bruna et al. [37] proposed a model that expands CNN to a graph convolutional neural network 
(GCN) to learn the graph data features. Yu et al. [38] proposed a model for traffic flow forecasting using 
(GCN) able to capture the satio-temporal features in the traffic flow data. Wei et al. [39] proposed a model 
that uses the Auto Encoder to extract the upstream and downstream characteristics of traffic flow data; then, 
the LSTM network uses the autoencoder's acquired characteristics and previous data to achieve more precise 
prediction performance. Liu et al. [40] proposed a model to forecast the bus traffic flow. Zhang et al. [41] 
proposed a deep autoencoder model to forecast the traffic flow by extracting the temporal correlations in the 
traffic flow data. To overcome the issue of the complex non-linearity of traffic flow data, this study 
introduces two novel models. The first model uses an LSTM network and the second model uses a GRU 
network that can extract the temporal features of traffic flow more efficiently. 


3. THE PROPOSED MODEL 
3.1. Dataset 

This study analyses the traffic data using two different datasets using two novel models. The first 
model uses an LSTM network and the second model uses a GRU network that can extract the temporal 
features of traffic flow more efficiently. The first dataset is performance measurement system (PEMS) 
collection; data are aggregated in real-time from sensors installed on the freeway in all of California's main 
metropolitan regions. For traffic flow forecasting, the PEMS dataset is widely used as a public benchmark. 
PEMS delivers real-time data from more than 39,000 sensors distributed statewide in California's freeway 
systems [42]. This freeway's average traffic flow is determined by averaging the traffic data obtained by 
several detectors. The dataset contained 7,776 training instances and 4,320 testing instances. 

The second dataset is the traffic and congestions (TRANCOS) [43] dataset. It is the first dataset to 
count vehicles in images of traffic jams captured with real-world traffic monitoring cameras. Also, it is 
frequently used to assess the generalizability of vehicle counting techniques. The cameras picked monitor 
various motorways in the Madrid area, which are notorious for their intense traffic congestion. Each image 
has been annotated with a precise vehicle number and their locations for each image, where 46,796 vehicles 
have been annotated in total. Note that each of the collected images has traffic congestion, spanning a 
number of diverse scenarios and angles, with varying lighting conditions, varying degrees of crowdedness, 
and overlap, even in the same image. But in this study, the TRANCOS dataset is used as metadata, and this is 
the first time that it has been used in this form by extracting all possible features from each image in the 
dataset, such as (the number of vehicles, time, and date of capturing the image). The dataset has been divided 
into a train and test split. The train split consists of 1,031 instances, while the test split consists of 
213 instances. 


3.2. Model architecture 

Due to the problem of non-linearity of traffic flow data. The architecture of this study introduces 
two novel models that deal with the non-linearity in traffic data to achieve more precise prediction 
performance, as shown in Figure 1. The first model is introduced based on LSTM network. The second 
model is introduced based on GRU network that can extract the temporal features of traffic flow more 
efficiently in the near future to increase transportation efficiency. 
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Pre-Processing Phase 


Figure 1. Model architecture diagram 


3.2.1. Pre-processing phase 

Data normalizing is a very crucial procedure in data preprocessing. It ensures the input data quality 
to deep learning models. Data normalization is necessary when features have varying value ranges. In this 
stage, the data normalization is used to transform features to be on a similar scale that will enhance the 
model's performance and training stability and reduce the time of training. There are different kinds of data 
normalization methods, including linear scaling, clipping, Z-score decimal scaling, and log scaling. In this 
study, the most popular normalization method, linear scaling to unit range normalization, is used [44]. 
a. Linear scalling 

Linear scaling is the simplest type of scaling. It is a linear transformation technique that normalizes 
the data in a range, usually [0, 1] or [-1, 1]. With a lower bound min (x;) and upper bound max (x;) of an 
attribute x;, the normalization value is given by: 


xj—min (xi) 
Xi = 


(1) 


max (xi)-min (xi) 


3.2.2. Learning phase 

The first model starts with two LSTM units that are used to extract the traffic flow temporal 
features. These temporal features are combined into a feature vector followed by four dense layers to perform 
the traffic flow prediction, as shown in Figure 2. The second model starts with two GRU units that are used 
to extract the traffic flow temporal features. These temporal features are combined into a feature vector 
followed by three dense regression layers to perform the traffic flow prediction, as shown in Figure 3. The 
mean squared error loss function is used as the objective function, which is discussed later. LSTM and GRU 
are discussed in detail in the following subsections. 


Figure 2. Model framework of LSTM 
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Figure 3 Model framework of GRU 


a. Long-short term memory 

The so-called vanishing gradient problem is present in traditional recurrent neural network (RNN) 
design. To overcome this disadvantage, certain RNN structures, such as LSTM, which was first introduced in 
[45], were built to provide memory cells the capacity to determine when to forget information, hence 
determining the ideal time lags for time series problems. As its enduring memory capacity, these 
characteristics are particularly desired for short-term traffic flow prediction in the transportation domain. The 
structure of LSTM NN cells is shown in Figure 4. A typical LSTM has three multiplicative units, i.e., the 
input gate, the forget gate, and the output gate. The input gate is used to memorize some information from the 
present, the forget gate is used to choose to forget some information from the past, and the output gate uses 
all calculated results to produce output for the LSTM NN cell [46]. 

In Figure 4 f, denotes the forget gate, i, denotes the input gate, and o, denotes the output gate, that 
will together control the cellular state C, that stores the future and historical information. The input and 
output information of LSTM unit is x, and h, respectively. The processing operation of the LSTM unit can 
be expressed using (2)-(7): 


fe = o(W, - [he-a, xe] + by) (2) 
ip = o(W; : [hea %] + bi) (3) 
Č, = tan h(We « [he-1,%¢] + be) (4) 
Ci = fe X Ce- th KE, (5) 
ot = o (Wolhi-1, xt] + bo) (6) 
hi = 0, X tanh(C;) (7) 


The forget gate is represented in (2), which receives the cell state weighted sum at time t , the cell 
output at time t — 1, and the activation function is the sigmoid function. The input gate is represented in (3) 
and has the same parameters as in (2), the memory unit value at time t is calculated in (4), and the activation 
function is a tanh function. The present and past memories are concatenated in (5). The output gate is 
represented in (6) and has the same parameters as in (2). The output of the cell is represented in (7), where 
the activation function is a tanh function, W is the weighted vector matrix, and b is the bias vector. 


b. Gated recurrent unit 

GRU is first proposed in [47], it is similar to LSTM, but it is simpler to implement and compute. 
The GRU cell structure is shown in Figure 5. GRU cell consists of two gates: the reset gate r and the update 
gate z. As in the LSTM cell, the output of the hidden state at time t is determined by the hidden state t — 1. 
The function of forgetting gates in LSTM is similar to the function of resetting gates in GRU. In this study, 
GRU NNs use the same regression part and the optimization method of the LSTM NNS, and the input time 
series value at time t is calculated in (8): 


he = fhe-1, Xt) (8) 
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Figure 4. Structure of LSTM NN cells 
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Figure 5. Structure of GRU cells 


c. Training details 

The two models are trained using RMSprop optimizer with a base learning rate (a = 0.0005 based 
on trials and error operations). Throughout the training, a consistent learning rate was maintained, and 
consistent training was maintained across architectures. In addition, the mean square error loss function is 
used to assess the difference between the observed traffic flow and the predicted traffic flow. Here is the 
formula for the loss function as shown in (9): 


Lt Pt) = DA Or - Pr)” (9) 


Where the number of training examples is denoted as n and the observed traffic flow is denoted as y, And the 
predicted traffic flow is denoted as p+. 


3.2.3. Prediction result 
- Performance measures 

The efficiency of the two models is evaluated for prediction using the following three performance 
measures. The first measure is the mean absolute error (MAE) which is useful for continuous variable data. 
The second measure is the root means square error (RMSE), the standard deviation of residuals (prediction 
errors). The third measure is the mean absolute percentage error (MAPE) which measures the accuracy of the 
prediction typically shown as a percentage. They are formulated as: 
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Mean absolute error: 
1 
MAE = a t=1 [Ye — pel (10) 


Root mean square error: 


RMSE = JAS, lye— pil (11) 


Mean absolute percentage error: 


1 lye-Pel 
MAPE =- —— 12 
n t=1 Yt ( ) 


where y; is the observed traffic flow, and p; is the predicted traffic flow. 


4. EXPERIMENTAL WORK AND RESULTS 

The experiments are performed on two traffic flow datasets: PEMS [42] and TRANCOS [43] 
dataset. The two models are trained using a rmsprop optimizer with a base learning rate (a = 0.0005 based 
on trials and error operations). Throughout the training, a consistent learning rate was maintained, and 
consistent training was maintained across architectures. In Figure 2 and Figure 3, the input combination layer 
and the intermediate combination layer, and the output combination layer has the same activation function, 
and the activation function is RELU activation function. The three combination layers have the same dropout 
rate and are set to 0.2. The experimental results indicate that the two proposed models give a promising 
result, indicating that they can perform well in specific cases and are able to suddenly capture trend changes 
in the traffic data flow. 

According to Figure 6, GRU introduces a significant performance on the PEMS dataset that will 
give an accurate prediction of the traffic flow at most of the predicted times. According to Figure 7, LSTM 
introduces a significant performance on the TRANCOS dataset that will give an accurate prediction of the 
traffic flow at most of the predicted times. As shown in Table 1, the two models give a promising result on 
the PEMS dataset in terms of MAE, MAPE, and RMSE. As shown in Table 2, the two models give a 
promising result on the TRANCOS dataset that is firstly used as metadata in terms of MAE, MAPE, and 
RMSE. 
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Figure 6. Performance comparison with the proposed models (LSTM and GRU) on PEMS dataset 
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Figure 7. Performance comparison with the proposed models (LSTM and GRU) on TRANCOS dataset 


Table 1. The results on PEMS dataset 


Methods Year MAE MAPE (%) RMSE 
STACKED AUTOENCODERS [30] 2014 34.1 — 50.0 
ARIMA [46] 2016 19.17 — 29.0 
LSTM NN [46] 2016 18.12 — 26.64 
GRU NN [46] 2016 17.21 — 25.86 
LSTM-M [48] 2018 32.2 — 47.04 
DNN-BTF [36] 2018 19.12 — 27.91 
LSTM [49] 2018 7.21 16.56 9.90 
GRU [49] 2018 7.20 16.78 9.97 
SAEs [49] 2018 7.06 17.80 9.60 
Method [39] 2019 25.26 — 35.45 
LSTM [50] 2019 20.59 — 30.86 
PLSTM [50] 2019 19.98 —— 30.21 
PLSTM+ [50] 2019 19.06 — 28.78 
MSTGCN [51] 2019 17.47 — 26.47 
ASTGCN [51] 2019 16.73 — 25.27 
Linear regression [6] 2019 7.60 — 10.32 
Multi-layer perceptron [6] 2019 11.26 — 13.63 
RBF network [6] 2019 20.58 — 27.10 
RBF regressor [6] 2019 7.21 — 9.72 
SMO reg [6] 2019 7.55 — 10.36 
M2 [52] 2020 20.88 — 33.12 
LSTM-BILSTM [53] 2021 12.63 — 16.72 
Federated learning [54] 2021 7.96 — 11.04 
MLP-NN [55] 2022 7.24 18.21 9.80 
Proposed LSTM model 
Proposed GRU model 2022 7.14 16.37 9.74 


Table 2. The results on TRANCOS dataset 


Methods Year MAE MAPE (%) RMSE 
Proposed LSTM model 2022 10.78 28.36 8.23 
Proposed GRU model 2022 11.88 28.48 8.82 


5. CONCLUSION 

In this study, a short-term traffic flow prediction is introduced for ITSs to facilitate the scheduling of 
higher-level traffic that will enable travelers to save time and energy consumption. The current methods for 
predicting short-term traffic flow are incapable of effectively capturing the complex non-linearity of traffic 
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flow, which leads to low prediction accuracy. To overcome this problem, this study introduces two novel 
models that deal with the non-linearity traffic data to predict traffic flow more efficiently. The first model 
uses an LSTM network followed by four dense layers, and the second model uses a GRU network followed 
by three dense layers. LSTM and GRU are used to extract the traffic flow temporal features, and the dense 
layers are used to perform the traffic flow prediction. The performance of the two models is evaluated using 
three metrics for the error analysis. The first metric is (MAE) which is used for continuous variable data. The 
second metric is (MAPE) which measures the accuracy of the prediction, typically shown as a percentage. 
The third metric is (RMSE) which is the standard deviation of residuals (prediction errors). The experimental 
results from MAE, MAPE, and RMSE indicate that the two models can properly capture the temporal 
correlations in the correlated traffic series and make reliable predictions. So, the two models can do this in 
specific cases and are able to suddenly capture trend changes. The proposed models provide improved 
predicting performance compared to the existing methods. 
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